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Abstract 

O ■ 

' Two languages are finitely different if their symmetric difference is finite . 

s ! , We consider the DFAs of finitely different regular languages and find major 

■ structural similarities. We proceed to consider the smallest DFAs that recog- 
' nize a language finitely different from some given DFA. Such f-minimal DFAs 

[jL^ , are not unique, and this non-uniqueness is characterized. Finally, we offer a 

■ solution to the minimization problem of finding such f-minimal DFAs. 

1 Preliminaries 

q , A DFA is a quintuple (Q, E, S, qo, A) following the standard definition [1], where Q 

is the set of states, E is the alphabet, S is the transition function, qq is the starting 
\ state, and A is the set of accepting states. 

> : 

C*") , We extend the transition function S to words in the standard way. We only con- 

sider DFAs where all states are reachable. By default, consider D and D' to refer to 
DFAs, with D = (Q, E, 5, q , A) and D 1 = {Q> , E, S\ q' , A'), and consider L and V 

t~-^ • to be their languages. Finally, if D is a DFA, then L(D) is the language recognized 

|> i by D. 

O ; 

2 Results 



X 



The first subsection investigates the numerous similarities between DFAs that rec- 
ognize finitely different languages. It contains the bulk of our results. The second 
subsection addresses a natural minimization problem - finding f-minimal DFAs. It 
contains a single theorem and the sketch of an algorithm. 



2.1 Main Results 

Definition 1 (Finitely Different Languages). If the symmetric difference L A L' is 
a finite set, then L and L' are finitely different and we write L ~ L' . 

This paper investigates the DFAs of finitely different languages. Note that the set 
of regular languages is closed under finite difference: if L is regular and L ~ L' , 
then L' is regular. 

Definition 2 (Equivalence Classes). Finite difference is an equivalence relation. 
The equivalence classes of this relation are called language-classes. In a natural 
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way, we extend this relation to DFAs such that D <~ D' if L(D) ~ L(D'), and each 
DFA is likewise a member of some (equivalence) DFA- class. 

Definition 3 (Finite Part and Infinite Part). For any DFA D = (Q, E, S, q , A), Q 
is partitioned into two sets of states: the finite part and the infinite part. To aid 
understanding, we offer two equivalent definitions of the finite and infinite parts: 

1. For every state q G Q, consider the set {w€^*\S(qo, w) = q}. If this set is 
finite, q is in the finite part of D, denoted by F(D). If this set is infinite, q is 
in the infinite part of D, denoted by 1(D). 

2. A state q G Q is in the infinite part iff it is either on a cycle (that is, 3w G 
E+|<5(g, w) = q) or reachable from a state which is on a cycle. 

Definition 4 (Infinite Part Isomorphism). Two DFAs D — (Q, S, 5, qo, A) and 

D 1 = (Q' , E, 8' , q' ,A') are said to have isomorphic infinite parts, denoted by D =/ 
D', if there exists a bijection / : 1(D) — > I(D') such that 

1. {Vq G 1(D)), q eA^ f(q) G A' and 

2. (Vq G 1(D), Vc e T,),f(5(q, cj) = S'(f(q),c). 

Theorem 5 (Infinite Part Isomorphism). If D and D' are minimized and D ~ D' , 
then D D' . 

Proof. Let D and D' be minimized DFAs whose languages (L and V) are finitely 
different. For D, there is some length of word above which all input strings "end up 
in" the infinite part. That is, there exists a k so that \w\ > k => d(qo,w) £ 1(D). 
Likewise for D' . Furthermore, since the languages have only a finite difference, 
there is some length of word above which the languages are identical. Let N be the 
maximum of these three numbers. 

With each state q G 1(D), we associate a representative string w q such that 
S(qo,w q ) = q and |io q | > N. Strings of sufficient length must exist, since infinitely 
many strings reach q. Now consider the function / : 1(D) — ► I(D') defined by 
f(q) = 5'(qo,w q ). We will show that / is an infinite part isomorphism. 

Let qi ^ q 2 G 1(D) and let w\ and w 2 be their representative strings. Since D is 
minimized, there is a string t such that w\t G L iff w 2 t £ L. Since \w\\, \w2\ > N, 
obviously \wit\, \w2t\ > N and therefore w\t G L' iff W2t £ V by the definition of N. 
This means that 8' (q' ,w\t) ^ 5'(q' ,u>2t), which implies that f(qi) = S'(q' ,wi) ^ 
5'(q' , W2) = f(q2)- Hence, / is an injection. We can interchange D and D', and 
choose representative strings for I(D') to obtain an injection /' : I(D') — > 1(D). 
Therefore 1(D) and I(D') have the same cardinality and / is a bijection. To com- 
plete the theorem, we prove that / satisfies the two conditions of Definition 4: 

1. We use a proof by contradiction. Consider any x G 1(D) and c G E. Let 
x' = f(x). Let y = 6(x,c) and z be such that f(z) = S'(f(x),c). Suppose 
that f(y) ^ f(z). Then y ^ z, so there exists some distinguishing string d 
between them. If w x and w z are representative strings for x and z respectively, 
then w x cd G L iff w z d £ L. But in D' , w x c and w z go to the same state f(z), 
so w x cd G L' iff w z d G L' . We are forced to conclude that D and D' disagree 
on one of w x cd and w z d, but this contradicts our choice of N. 

2. Let q G 1(D). Since \w q \ > N, w q G L iff w q G L' . Hence, by the definition of 
f,qeA iff f(q) G A'. 
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□ 

Proposition 6. The converse of Theorem 5 is false. 

Proof. Consider the minimized DFAs for 0* and 10*. Their infinite parts are iso- 
morphic, but the languages differ on infinitely many strings. □ 

Definition 7 (Induced languages). Consider a DFA D = (Q,T,,5,qo,A). The 
language induced by q £ Q is the language recognized by the DFA (Q,H,S,q,A). 
This language is denoted by L(q). We extend the finite difference relation to states, 
where if L(p) ~ L(q) then p ~ q, and p and q are members of the same state-class. 

Definition 8 (S(D) and Q C {D)). For any DFA D, define: S(D) = {[L(q)] :qeQ}, 
where [L] denotes the language-class of L. For any language-class C £ S(D), let 
Qc{D) denote the set of states of D inducing a language in C. 

Theorem 9. If D ~ D' , then S(D) = S(D'). 

Proof. Suppose S(D) + S(D'), with C £ S(D) \ S(D'). For some q £ Q C {D), let w 
be a word such that S(qo, w) = q. Let q' = 5'(q' , w). L(q') ^ C, so W — L(q)AL(q') 
is an infinite set. Since D and D' disagree on any word of the form wd, where d £ W, 
DfD'. □ 

Proposition 10. The converse of Theorem 9 is false. 

Proof. Consider DFAs D and D' where L(D) = {w: \w\ is odd} and L(D') = 
{w: \w\ is even}. S(D) = S(D'), but the DFAs disagree on infinitely many strings. 

□ 

Lemma 11. If D q is the induced DFA of q £ Q in some DFA D, then I(D q ) C 
1(D). 

Proof. Let w be a word such that 8(qo, w) = q. Then for any state q' £ Q, S(q, w') — 
q' — > 8(qo, ww') = q' . Therefore, if any state q' can be reached from q by infinitely 
many strings, then by prepending w to those strings it is clear that q' can also be 
reached from q by infinitely many strings. □ 

Proposition 12. If D and D' are minimized DFAs, then S(D) = S(D') -> D 
D'. 

Proof. Suppose S(D) — S(D'). Then there must exist some state q' £ Q' such that 
qo ~ q', where qo is the start state of D. Let D' q be the induced DFA of q' . By 
Lemma 11, I(D' q ) C I(D') hence \I(D' Q )\ < \I(D')\. Since q ~ q', D ~ D' q , so 
by Theorem 4 D =i D' and = I(D' ). Combining the two results obtains 

|/(£>)| < \I(D')\, and by symmetry \I(D')\ < \I(D)\, so \I(D)\ = \I(D')\. Therefore, 
I(D' q ) = I(D') and D =/ D' . □ 

Proposition 13. The converse of Proposition 12 is false. 

Proof. Consider the minimized DFAs for 0* and 10*. Their infinite parts are iso- 
morphic, but no state in the former is in the same state-class as the start state of 
the latter. □ 
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Remark 14. In the results concluding with Proposition 13, we have fully articulated 
the relationships between finite difference, S(D) equivalence, and infinite-part iso- 
morphism. In summary, D ~ D' — > S(D) = S(D') —►£)=/ D', and none of the 
reverse implications is true. As partitions on the set of all DFAs, each is a proper 
refinement of the next. 

Definition 15 (f-mcrge). The f-merge operation combines two states of a DFA, 
given p,q G Q with p ~ q and p G F(D). To f-merge p and g, delete p and whenever 
6(x, c) = p, replace the transition with S(x, c) — q. Note that since p G f(-D) it is 
impossible for S(p, c) = p. 

Lemma 16. The f-merge operation makes only a finite difference in a DFA 's lan- 
guage. 



Proof. Suppose we are going to apply the f-merge operation to states p, q of DFA 
Di, turning it into Di. Let X be the set of words that go to p, and let Z be the 
set of words L(p) A L(q). The presence in L(D\) of any word not passing through 
p is unaffected. Considering a word of the form xw for x G X we see that unless 
w G L(p) A L(q), the status of xw with respect to L(Di) will not change. Hence 
we see that |L(£>i) A L(D 2 )\ = \X * Z\ = \X\\Z\ < oo since \X\, \Z\ < oo. So 
Di - D 2 . □ 

Definition 17 (f-minimal). D is f -minimal if for any D', D ~ D' — > |Q| < |Q'|. 

Lemma 18. In an f-minimal DFA, each state in the finite part is the sole repre- 
sentative of its state-class. In other words, if D is f-minimal with p G F(D), then 
p~q^>p = q. 



Proof. If p G F(D), p <~ g, and p ^ q, then p and g can be f- merged. By Lemma 16, 
this would result in a smaller DFA of the same DFA-class, meaning D could not be 
f-minimal. □ 

Definition 19 (Isomorphic Finite Part). D and D' are said to have isomorphic 
finite parts up to acceptance if there exists a bijective function /: F{D)^F{D') 
such that: (V&, q y G F(£>))(Vc G E), c) = g y -> 5'(f(q x ), c) = f(q y ). 

Theorem 20. If D and D' are f-minimal and D ~ D' , then their finite parts are 
isomorphic up to acceptance. 



Proof. First, by Theorem 9, S{D) = S(D'). Second, since all f-minimal DFAs are 
minimized, D = j D', so the state-classes represented by 1(D) are the same as those 
represented by I(D'). So by subtraction, the state-classes represented F(D) are the 
same as those represented by F(D'). By Lemma 20, or by noting that \Q\ = \Q'\ 
and \I(D)\ = \I(D')\, we may conclude that \F(D)\ = \F(D')\. Therefore, we 
construct our bijection / : F(D) — > F(D') by mapping each state in F(D) to the 
state in F(D') whose induced language is in the same language-class. Consider any 
p, q G F(D) and c G S where 5(p,c) = q. The languages of p and f(p) differ on 
only finitely many strings. Since every difference between the induced languages of 
S(p, c) and S'(f(p),c) causes a difference between the induced languages of p and 
f(p) (one that begins with c) we conclude that L(S(p, c)) ~ L(S'(f(p),c)). Hence, 
f(q) = S'(f(p),c), as required. □ 

Remark 21 (Non- uniqueness of f-minimal DFAs). Through the finite- and infinite- 
part isomorphism theorems, we have shown that there must be major structural 
similarities between any two f-minimal DFAs of the same DFA-class. Only two 
aspects have not been shown to be equal: the acceptance-values of states in the 
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finite part and the transitions that go from a finite-part state to an infinite-part 
state. Indeed, both of these aspects may be altered. The acceptance values of states 
in the finite part can be altered arbitrarily while affecting neither DFA-class nor f- 
minimality. As for the finite-part to infinite-part transitions, f-minimal DFAs within 
a class can differ on this aspect as well. However, an argument similar to that of 
Theorem 20 shows that these transitions can only swap destinations within a single 
state-class (i.e., when there are multiple infinite-part states in the same state-class, 
transitions into that state-class may permute with each other). Furthermore, such 
a swap will preserve both DFA-class and f-minimality, while any other swap will 
not, so this is the best possible result. 

The previous results may suggest that finite language differences originate with 
finite-part differences. However, they may also occur when infinite parts have mul- 
tiple states in the same state-class. The final result of this section demonstrates 
how extreme this can be. 

Proposition 22. For any finite set of words W over an alphabet with at least two 
characters, there exist minimized DFAs D and D' with F{D) = = F(D') and 
L(D) A L(D') = W. 

Proof. Let W be an arbitrary finite subset of X* for some |E| > 2. Let n — 
max{\w\ : w e W}. We will prove the hypothesis by construction, and D and D' 
will be identical except for the starting state. The alphabet T, is already determined. 
Now, letting Y> x and S x be the sets of words of length at most n and exactly x, 
respectively, we set Q — £„ x {0, 1}. Fixing a surjection <p ■ — > {(e, 0), (e, 1)} 
- such a function must exist since |E| > 2 - we set 8 as follows: 

S((w, i), c) = (wc, i) if \w\ < n, 
S((w,i),c) — <p(wc) if |iw| = n. 

Let A = {(w,i) : i = 1 and w £ W}. Setting D = (Q,E,5,(e,0),A) and D' = 
(Q, S, 5, (e, 1), A) completes our construction. It remains to prove that F(D) = 
F(D') — and L{D) A L(D') = W, and that these properties are preserved by 
minimization. 

To prove the first property, it suffices to show that the starting states are on a cycle. 
We begin with D. Since <p is surjective, let Wq be any word with 4>(wq) = (e, 0). 
Then we have S((e, 0),Wo) = (f>(w ) — (e, 0). Therefore, (e,0) G 1(D), and state 
reachable from (e, 0) (that is, every state) is also in 1(D), F(D) = 0. Since a 
DFA's language is unchanged by minimization, the starting state qo and 8(qo,wo) 
still induce the same language. In any minimized DFA, L(p) = L(q) — > p = q, so 
<Zo = 8(q , w ) and the starting state is still on a cycle. Therefore, F(D) = before 
and after minimization. By a symmetrical proof, the same holds for F(D'). 

To prove the second property, begin by considering any word w with |tu| < n. 
It should be clear that S((e,i),w) — (w,i). Therefore, by the definition of A, 
w e L(D) A L(D') iff w e W. Continuing, for any word w with w = n + 1 wc 
have 5((e,0),w) = 5((e,l),w) = <p(w). Since D and D' go to the same state on 
any word of length n+1, they also go to the same state on any word of length 
greater than n+1. Therefore, D and D' agree on any word w if \w\ > n + 1, so 
L(D) A L(D') = W, as desired. Finally, since minimization does not change the 
language of a DFA, this property too is preserved. □ 
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2.2 Algorithm 



In this section, we address the minimization problem posed by the concept of f- 
minimality: given a starting DFA, how can one find an f-minimal DFA in the same 
DFA-class? 

Theorem 23 (No Local Minima Under F-Merge). Greedy, repeated application of 
the f-merge operation to any minimized initial DFA will result in an f-minimal DFA 
of the same DFA- equivalence class as the original. 

Proof. Let D\ be the original minimized DFA. Since a DFA has finitely many states, 
f-merge can only be applied finitely many times, as each application reduces the 
number of states. Let D\...D n be the sequence of DFAs reached by applying f- 
merge, such that Dk+i is the result of some single application of f-merge to Dk, and 
there is no possible way to f-merge in D n . Let Dz be an f-minimal DFA in the same 
DFA-class as D\...D n . Suppose for contradiction that Dz has fewer states than D n . 
By Theorem 9, S(D n ) = S(D Z ). So there must exist some class C G S = S(D Z ) 
such that Qc{Dz) has fewer states than Qc{D n ). Consider the number of states 
from F(D n ) and I(D n ) in Qc{D n ). If the latter is positive, then the former must be 
zero, or else any finite-part state in Qc(D n ) could be f-merged with an infinite-part 
state, contradicting our assumption that no more f-mcrgcs could be performed in 
D n . But by Theorem 5, D n =j Dz, so the number of states from I(D n ) in Qc{D n ) 
must equal the number of states from I{Dz) in Qc(Dz)- Therefore, there can be 
no states from I(D n ) in C. But by Lemma 18 there must be exactly one state from 
F(D n ) in C. Since Dz must have at least one state in C (by Theorem 9), there is 
no way it could have fewer states in C than D n does, contradicting our assumption 
that D n was not f-minimal. □ 

Algorithm 24 (F-Minimize). Theorem 23 immediately yields an algorithm for 
f-minimizing any DFA - that is, turning it into an f-minimal DFA in the same 
DFA-class. This algorithm is surely suboptimal, so we only sketch the proof. The 
input is a DFA D = (Q, S, S, q a , A). 

1. Minimize D using any minimization algorithm 

2. Divide Q into the finite and infinite parts 

3. For each pair of states p, q, determine whether p ~ q 

4- Within each state-class, f-merge any p, q pair where p G F(D) 

The first step is standard. The second step can be accomplished by determining 
for each state q, using either depth- or breadth-first search, the set of all states 
reachable from q, and then applying the second part of Definition 3. The third step 
can be accomplished by, for each p and q, creating a DFA recognizing the language 
L{p) A L{q). This is done by using the standard Q x Q cross-product construction 
with D p = (Q,E, S,p, A) and D q — (Q,Y*,S,q,A) as inputs, where state (x,y) is 
accepting if x G A xor y G A. The resultant DFA is D pq , and p ~ q if after 
minimization D pq has infinite part equal to a single non-accepting state with all 
transitions leading to itself. (DFAs with this property recognize finite languages, 
and if L(D pq ) is finite then by construction p ~ q.) After performing the fourth 
step, Theorem 23 proves that the resultant DFA will be f-minimal. Step 3 dominates 
the running time, as it involves the costly cross-product and minimization over all 
pairs of states. If n = \Q\, then Step 3 takes 0{n A * logn) time - n 2 to go through 
each pair of states, and n 2 logn on each of those to minimize the cross-product DFA. 
We hope and believe that there is room for improvement on this algorithm. 
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